Full Data Ingest

In this notebook we will show the data available in the WorldFloods data repository and how it is derived.

Overview

In this notebook we will:

  1. Examine a specific flood event

  2. Build Flood MetaData and Floodmap using

    • area of interest (AOI)

    • observed flood event

    • hydrography areas

    • hydrography lines

  3. Retrieve Sentinel-2 and JRC Permanent Water for a Flood AOI

1 - Flood Events

image.png

Copernicus Emergency Management Service (EMS) provides warnings and risk assessment of floods and other natural disasters through geospatial data derived from satellite imagery. It provides data before, during, and after events in Rapid Mapping format for emergencies and Risk and Recovery format for prevention. In WorldFloods we are primarily interested in the satellite derived imagery with a focus on fluvial, or river, floods.

Copernicus EMS Rapid Activations

The Copernicus EMS Activation Mappings for individual events given by unique EMSR codes may be accessed here. Linked to each EMSR code per severe event in the Rapid Activations table is the individual webpage for the EMSR code featuring vector zip files from various sources. Each vector zip file may contain multiple products with respect to an event. For this notebook, we will walk through a single EMSR flood event from the alert code, to its associated url.

We may retrieve a pandas DataFrame format of the EMS Rapid Activations table using the function table_floods_ems from the activations module in ml4floods/src/data. In this function you may specify how far back in time you would like to retrieve flood emergency mappings for. Be careful when choosing dates prior to 23 June 2015 when Sentinel-2 was launched into space.

from pyprojroot import here
import sys
import os
root = here(project_files=[".here"])
sys.path.append(str(root))
from src.data.copernicusEMS import activations
from src.data import utils
from pathlib import Path

Let’s take a look at the flood events that have occurred since the start of 2021.

table_activations_ems = activations.table_floods_ems(event_start_date="2021-01-01")
table_activations_ems
Title CodeDate Type Country
Code
EMSR504 Floods, Australia 2021-03-19 Flood Australia
EMSR502 Flood in Southern Ireland 2021-02-23 Flood Ireland
EMSR501 Flood in Albania 2021-01-06 Flood Albania
EMSR498 Flood in Corrèze department, France 2021-02-02 Flood France
EMSR497 Flood in Germany 2021-02-01 Flood Germany
EMSR496 Flood in Lazio Region, Italy 2021-01-26 Flood Italy
EMSR495 Tropical cyclone Eloise in Mozambique,... 2021-01-22 Storm Mozambique, Eswatini, Zimbabwe
EMSR492 Flood in Landes, France 2021-01-01 Flood France

EMSR 501: Flood in Shkodra, Albania

During heavy rains 4000 hectares of land was affected by flooding impacting 200 people. We take this event as an example to show how we retrieve and ingest data as an entry point to the preprocessed data used in the WorldFloods machine learning pipeline.

image.png

Retrieve the Zip File URLs for a given EMSR Code

The Copernicus EMS Activation Mapping URL location for the January 6th flood event in Shkodra, Albania can be fetched using the activation code “EMSR501” and the function fetch_zip_files from activation.py which outputs the url locations as a list of strings. Each of these zip file urls allow us to retrieve the vector shapefiles for different areas of interest within a single activation code.

emsr_code = "EMSR501"
zip_files_activation_url_list = activations.fetch_zip_file_urls(emsr_code)
zip_files_activation_url_list
['https://emergency.copernicus.eu/mapping/download/184632/EMSR501_AOI01_DEL_MONIT02_r1_VECTORS_v1_vector.zip',
 'https://emergency.copernicus.eu/mapping/download/184615/EMSR501_AOI01_DEL_MONIT01_r1_VECTORS_v1_vector.zip',
 'https://emergency.copernicus.eu/mapping/download/184606/EMSR501_AOI01_DEL_PRODUCT_r1_VECTORS_v1_vector.zip']

Notice that this particular EMSR flood code has several area of interest (AOI) files. These files are present in the ml4cc_data_lake Google Cloud Storage bucket repository in their unzipped raw format as well as their zipped format. They can be found in the following locations:

  • zipped files: gs://ml4cc_data_lake/0_DEV/0_Raw/WorldFloods/copernicus_ems/copernicus_ems_zip

  • unzipped files: gs://ml4cc_data_lake/0_DEV/0_Raw/WorldFloods/copernicus_ems/copernicus_ems_unzip

Saving the Zip Files and Unzipping Locally

If you would like to save the zip files locally, you may do so by downloading them directly from ml4cc_data_lake or by downloading the zip files directly from the Copernicus EMS rapid activations. Since this notebook goes over data ingestion from start to finish, we explain the process for both cases. To save the files locally we do the following:

from tqdm import tqdm
folder_out = f"{root}/datasets/Copernicus_EMS_raw/{emsr_code}"
utils.create_folder(folder_out)

unzip_files_activation = []
for zip_file in tqdm(zip_files_activation_url_list):
    local_zip_file = activations.download_vector_cems(zip_file, 
                                                      folder_out=folder_out)
    unzipped_file = activations.unzip_copernicus_ems(local_zip_file,
                                                     folder_out=folder_out)
    unzip_files_activation.append(unzipped_file)
  0%|          | 0/3 [00:00<?, ?it/s]
Folder '/home/satyarth934/projects/ml4floods/datasets/Copernicus_EMS_raw/EMSR501' is created.
100%|██████████| 3/3 [00:24<00:00,  8.13s/it]

2 - Building EMSR Flood Metadata and Floodmaps

Following the scraping and downloading of Copernicus EMS Rapid Mapping Products in, we unzipped the files in our local file directory or to the Google Cloud Storage Bucket.

Once unzipped, multiple .shp files can be associated with a single EMSR Flood Activation Code. These files represent different activation layer data available as Copernicus EMS rapid mapping products. For more details with respect to each of the layers see below.

image.png

The layers of particular value to WorldFloods are the Metadata (A), Crisis (B), and Base Data (D) layers. More specifically, we are interested retrieving:

  • A: Source and The Area of Interest (AOI)

  • B: Observed Event data for Floods

  • D: Hydrography areas and lines indicative of lakes and rivers

With these data we can create a composite floodmask over an AOI. We will later obtain Sentinel-2 images associated with an AOI which we will overlay with a floodmap layer derived from A, B, and D Copernicus EMS products.

These shapefiles will be used to compose a floodmap for use in the generation of ground truth labels to be used in a supervised learning model pipeline.

2a - Process Shapefiles

In function filter_register_copernicusems we check that all the .shp follow the expected conventions with respect to timestamp and data availability.

2b - Populate Copernicus EMSR Metadata Dictionary

filter_register_copernicusems upon extracting the source, AOI, observed event, and hydrography information, will populate a dictionary with keys associated with the source data and hold the filenames and paths of specific .shp files as values.

2c - Generate Floodmap

We then process the .shp files area of interest, observed event, and the hydrography into a single geopandas.GeoDataFrame object using generate_floodmap. This information will be uploaded to the WorldFloods data bucket for new events.

# folder_out=f"Copernicus_EMS_raw/{activation}"
folder_out = f"{root}/datasets/Copernicus_EMS_raw/{emsr_code}"

code_date = table_activations_ems.loc[emsr_code]["CodeDate"]

registers = []
for unzip_folder in unzip_files_activation:
    metadata_floodmap = activations.filter_register_copernicusems(unzip_folder, code_date)
    if metadata_floodmap is not None:
        floodmap = activations.generate_floodmap(metadata_floodmap, folder_files=folder_out)
        registers.append({"metadata_floodmap": metadata_floodmap, "floodmap": floodmap})
        print(f"File {unzip_folder} processed correctly")
    else:
        print(f"File {unzip_folder} does not follow the expected format. It won't be processed")
File /home/satyarth934/projects/ml4floods/datasets/Copernicus_EMS_raw/EMSR501/EMSR501_AOI01_DEL_MONIT02_r1_VECTORS_v1_vector processed correctly
File /home/satyarth934/projects/ml4floods/datasets/Copernicus_EMS_raw/EMSR501/EMSR501_AOI01_DEL_MONIT01_r1_VECTORS_v1_vector processed correctly
File /home/satyarth934/projects/ml4floods/datasets/Copernicus_EMS_raw/EMSR501/EMSR501_AOI01_DEL_PRODUCT_r1_VECTORS_v1_vector processed correctly
import numpy as np
np.unique(floodmap.source)
array(['area_of_interest', 'flood', 'hydro', 'hydro_l'], dtype=object)
import matplotlib.pyplot as plt
# initialize figure
fig, ax = plt.subplots(figsize=(16,16))
floodmap[floodmap["source"] == "flood"].plot(ax=ax, facecolor="None", edgecolor="blue", label="Flood Maps", linewidth=0.8)
ax.set(
    title="Flood"
)
plt.show()
../../../../_images/full_data_ingest_23_0.png
import matplotlib.pyplot as plt
# initialize figure
fig, ax = plt.subplots(figsize=(16,16))
floodmap[floodmap["source"] == "hydro_l"].plot(ax=ax, facecolor="None", edgecolor="blue", label="Flood Maps", linewidth=0.8)
ax.set(
    title="Hydrography"
)
plt.show()
../../../../_images/full_data_ingest_24_0.png
import matplotlib.pyplot as plt
# initialize figure
fig, ax = plt.subplots(figsize=(16,16))
floodmap[floodmap["source"] != "hydro"].plot(ax=ax, facecolor="None", edgecolor="blue", label="Flood Maps", linewidth=0.8)
ax.set(
    title="Floodmap"
)
plt.show()
../../../../_images/full_data_ingest_25_0.png
metadata_floodmap
{'event id': 'EMSR501_AOI01_DEL_PRODUCT_r1_VECTORS_v1_vector',
 'layer name': 'EMSR501_AOI01_DEL_PRODUCT_observedEventA_r1_v1',
 'event type': 'Flash flood',
 'satellite date': Timestamp('2021-02-12 16:32:21+0000', tz='UTC'),
 'country': 'NaN',
 'satellite': 'Sentinel-1',
 'bounding box': {'west': 19.238301964000073,
  'east': 19.710555657000043,
  'north': 42.095451798000056,
  'south': 41.873487114000056},
 'reference system': {'code space': 'epsg', 'code': '4326'},
 'abstract': 'NaN',
 'purpose': 'NaN',
 'source': 'CopernicusEMS',
 'area_of_interest_polygon': <shapely.geometry.polygon.Polygon at 0x7fe63eb86fd0>,
 'observed_event_file': 'EMSR501_AOI01_DEL_PRODUCT_observedEventA_r1_v1.shp',
 'area_of_interest_file': 'EMSR501_AOI01_DEL_PRODUCT_areaOfInterestA_r1_v1.shp',
 'ems_code': 'EMSR501',
 'satellite_pre_event': 'Open Street Map',
 'timestamp_pre_event': Timestamp('2021-02-12 00:00:00+0000', tz='UTC'),
 'hydrology_file': 'EMSR501_AOI01_DEL_PRODUCT_hydrographyA_r1_v1.shp',
 'hydrology_file_l': 'EMSR501_AOI01_DEL_PRODUCT_hydrographyL_r1_v1.shp'}
metadata_floodmap["area_of_interest_polygon"]
../../../../_images/full_data_ingest_27_0.svg

3 - Sentinel-2 and JRC Permanent Water Imagery Using Google Earth Engine

Sentinel-2 Image Retrieval Using Google Earth Engine

To retrieve Sentinel-2 images from Google Earth Engine, make sure you create an account and authenticate prior to running the cell below. The link to sign up is here.

image.png

To download Sentinel-2 images we will be using the module ee_download and interact with the map using geemap.eefolium.

We use the AOI polygons retrieved from CopernicusEMS to select the location and time that we are interested in and use Google Earth Engine to do the georeferencing. Notice that we may also render the hydrography and observed flood event.

from datetime import timedelta
from datetime import datetime
import geopandas as gpd
import pandas as pd
import ee
import geemap.eefolium as geemap
import folium
from src.data import ee_download, create_gt
ee.Initialize()

bounds_pol = activations.generate_polygon(metadata_floodmap["area_of_interest_polygon"].bounds)
pol_2_clip = ee.Geometry.Polygon(bounds_pol)


x, y = metadata_floodmap["area_of_interest_polygon"].exterior.coords.xy
pol_list = list(zip(x,y))
pol = ee.Geometry.Polygon(pol_list)

date_event = datetime.utcfromtimestamp(metadata_floodmap["satellite date"].timestamp())

date_end_search = date_event + timedelta(days=20)

img_col = ee_download.get_s2_collection(date_event, date_end_search, pol)
permanent_water_img= ee_download.download_permanent_water(date_event, pol_2_clip)


n_images_col = img_col.size().getInfo()
print(f"Found {n_images_col} S2 images between {date_event.isoformat()} and {date_end_search.isoformat()}")
Map = geemap.Map()

imgs_list = img_col.toList(n_images_col, 0)
for i in range(n_images_col):
    img_show = ee.Image(imgs_list.get(i))
    Map.addLayer(img_show.clip(pol_2_clip), 
                 {"min":0, "max":3000, "bands":["B4","B3","B2"]},f"S2 {i}", True)
    Map.addLayer(permanent_water_img, name="permanent water")

geojson_geojson = "geojson_show.geojson"
geojson_filepath = f"{root}/datasets/Copernicus_EMS_raw/{geojson_geojson}"
floodmap.to_file(geojson_filepath, driver="GeoJSON")
Map.add_geojson(geojson_filepath, name="FloodMap")

Map.centerObject(pol)
folium.LayerControl(collapsed=False).add_to(Map)
Map
Found 8 S2 images between 2021-02-12T16:32:21 and 2021-03-04T16:32:21
Make this Notebook Trusted to load map: File -> Trust Notebook
file_path = "gs://ml4cc_data_lake/0_DEV/1_Staging/WorldFloods/S2/EMSR501/AOI01/EMSR501_AOI01_DEL_MONIT01_r1_v1.tif"
import numpy as np
import rasterio
from rasterio import plot as rasterioplt
with rasterio.open(file_path) as src:
    # read image
    image = src.read()
    # convert to RBG Image
    rgb = np.clip(image[(3,2,1),...]/3000.,0,1)
    # plot image
    rasterioplt.show(rgb, transform=src.transform)
../../../../_images/full_data_ingest_35_0.png